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METHODS FOR IDENTIFYING DRUG TARGETS 



BASED ON GENOMIC SEQUENCE DATA 



Background of the Invention 



Field of the Invention 

This invention relates to methods for identifying drug targets based on genomic sequence data. More 
specifically, this invention relates to systems and methods for determining suitable molecular targets for the directed 
development of antimicrobial agents. 
Description of the Related Art 

Infectious disease is on a rapid rise and threatens to regain its status as a major health problem. Prior to the 
discovery of antibiotics in the 1930s, infectious disease was a major cause of death. Further discoveries, development, 
and mass production of antibiotics throughout the 1940s and 1950s dramatically reduced deaths from microbial 
infections to a level where they effectively no longer represented a major threat in developed countries. 

Over the years antibiotics have been liberally prescribed and the strong selection pressure that this 
represents has led to the emergence of antibiotic resistant strains of many serious human pathogens. In some cases 
selected antibiotics, such as vancomycin, literally represent the last line of defense against certain pathogenic bacteria 
such as Staphylococcus. The possibility for staphylococci to acquire vancomycin resistance through exchange of 
genetic material with enterococci, which are commonly resistant to vancomycin, is a serious issue of concern to health 
care specialists. The pharmaceutical industry continues its search for new antimicrobial compounds, which is a lengthy 
and ftdious, but very important process. The rate of development and introduction of new antibiotics appears to no 
longer be able to keep up with the evolution of new antibiotic resistant organisms. The rapid emergence of antibiotic 
resistant organisms threatens to lead to a serious widespread health care concern. 

The basis of antimicrobial chemotherapy is to selectively kill the microbe with minimal, and ideally no, harm 
to normal human cells and tissues. Therefore, ideal targets for antibacterial action are biochemical processes that are 
unique to bacteria, or those that are sufficiently different from the corresponding mammalian processes to allow 
acceptable discrimination between the two. For effective antibiotic action it is clear that a vital target must exist in 
the bacterial cell and that the antibiotic be delivered to the target in an active form. Therefore resistance to an 
antibiotic can arise from: (i) chemical destruction or inactivation of the antibiotic; (ii) alteration of the target site to 
reduce or eliminate effective antibiotic binding; (iii) blocking antibiotic entry into the cell, or rapid removal from the cell 
after entry; and (iv) replacing the metabolic step inhibited by the antibiotic. 

Thus, it is time to fundamentally re-examine the philosophy of microbial killing strategies and develop new 
paradigms. One such paradigm is a holistic view of cellular metabolism. The identification of "sensitive" metabolic 
steps in attaining the necessary metabolic flux distributions to support growth and survival that can be attacked to 
weaken or destroy a microbe, need not be localized to a single biochemical reaction or cellular process. Rather, 
different cellular targets that need not be intimately related in the metabolic topology could be chosen based on the 
concerted effect the loss of each of these functions would have on metabolism. 
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A similar strategy with viral infections has recently proved successful. It has been shown that "cocktails" of 
different drugs that target different biochemical processes provide enhanced success in fighting against HIV infection. 
Such a paradigm shift is possible only if the necessary biological information as well as appropriate methods of rational 
analysis are available. Recent advances in the field of genomics and bioinformatics, in addition to mathematical 
5 modeling, offer the possibility to realize this approach. 

At present, the field of microbial genetics is entering a new era where the genomes of several 
microorganisms are being completely sequenced. It is expected that in a decade, or so, the nucleotide sequences of the 
genomes of all the major human pathogens will be completely determined. The sequencing of the genomes of 
pathogens such as Haemophilus influenzae has allowed researchers to compare the homology of proteins encoded by 
10 the open reading frames (ORFs) with those of Escherichia coli, resulting in valuable insight into the H. influenzae 
metabolic features. Similar analyses, such as those performed with H. influenzae, will provide details of metabolism 
spanning the hierarchy of metabolic regulation from bacterial genomes to phenotypes. 

These developments provide exciting new opportunities to carry out conceptual experiments in silico to 
analyze different aspects of microbial metabolism and its regulation. Further, the synthesis of whole-cell models is 
15 made possible. Such models can account for each and every single metabolic reaction and thus enable the analysis of 
their role in overall cell function. To implement such analysis, however, a mathematical modeling and simulation 
framework is needed which can incorporate the extensive metabolic detail but still retain computational tractability. 
Fortunately, rigorous and tractable mathematical methods have been developed for the required systems analysis of 
metabolism. 

20 A mathematical approach that is well suited to account for genomic detail and avoid reliance on kinetic 

complexity has been developed based on well-known stoichiometry of metabolic reactions. This approach is based on 
metabolic flux balancing in a metabolic steady state. The history of flux balance models for metabolic analyses is 
relatively short. It has been applied to metabolic networks, and the study of adipocyte metabolism. Acetate secretion 
from £ coli under ATP maximization conditions and ethanol secretion by yeast have also been investigated using this 

25 approach. 

The complete sequencing of a bacterial genome and ORF assignment provides the information needed to 
determine the relevant metabolic reactions that constitute metabolism in a particular organism. Thus a flux-balance 
model can be formulated and several metabolic analyses can be performed to extract metabolic characteristics for a 
particular organism. The flux balance approach can be easily applied to systematically simulate the effect of single, as 
30 well as multiple, gene deletions. This analysis will provide a list of sensitive enzymes that could be potential 
antimicrobial targets. 

The need to consider a new paradigm for dealing with the emerging problem of antibiotic resistant pathogens 
is a problem of vital importance. The route towards the design of new antimicrobial agents must proceed along 
directions that are different from those of the past. The rapid growth in bioinformatics has provided a wealth of 
35 biochemical and genetic information that can be used to synthesize complete representations of cellular metabolism. 
These models can be analyzed with relative computational ease through flux-balance models and visual computing 
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techniques. The ability to analyze the global metabolic network and understand the robustness and sensitivity of its 
regulation under various growth conditions offers promise in developing novel methods of antimicrobial chemotherapy. 

In one example, Pramanik et al. described a stoichiometric model of £ coli metabolism using flux-balance 
modeling techniques [Stoichiometric Model of Escherichia coli Metabolism: Incorporation of Growth-Rate Dependent 
5 Biomass Composition and Mechanistic Energy Requirements, Biotechnology and Bioenoineerino. Vol. 56, No, 4, 
November 20, 1997). However, the analytical methods described by Pramanik, et al. can only be used for situations in 
which biochemical knowledge exists for the reactions occurring within an organism. Pramanik, et al. produced a 
metabolic model of metabolism for £ coli based on biochemical information rather than genomic data since the 
metabolic genes and related reactions for £ col/had already been well studied and characterized. Thus, this method is 

10 inapplicable to determining a metabolic model for organisms for which little or no biochemical information on metabolic 
enzymes and genes is known. It can be envisioned that in the future the only information we may have regarding an 
emerging pathogen is its genomic sequence. What is needed in the art is a system and method for determining and 
analyzing the entire metabolic network of organisms whose metabolic reactions have not yet been determined from 
biochemical assays. The present invention provides such a system. 

15 Summary of the Invention 

This invention relates to constructing metabolic genotypes and genome specific stoichiometric matrices from 
genome annotation data. The functions of the metabolic genes in the target organism are determined by homology 
searches against databases of genes from similar organisms. Once a potential function is assigned to each metabolic 
gene of the target organism, the resulting data is analyzed. In one embodiment, each gene is subjected to a flux- 

20 balance analysis to assess the effects of genetic deletions on the ability of the target organism to produce essential 
biomolecules necessary for its growth. Thus, the invention provides a high-throughput computational method to screen 
for genetic deletions which adversely affect the growth capabilities of fully sequenced organisms. 

Embodiments of this invention also provide a computational, as opposed to an experimental, method for the 
rapid screening of genes and their gene products as potential drug targets to inhibit an organism's growth. This 

25 invention utilizes the genome sequence, the annotation data, and the biomass requirements of an organism to 
construct genomicaily complete metabolic genotypes and genome-specific stoichiometric matrices. These 
stoichiometric matrices are analyzed using a flux-balance analysis. This invention describes how to assess the affects 
of genetic deletions on the fitness and productive capabilities of an organism under given environmental and genetic 
conditions. 

30 Construction of a genome-specific stoichiometric matrix from genomic annotation data is illustrated along 

with applying flux-balance analysis to study the properties of the stoichiometric matrix, and hence the metabolic 
genotype of the organism. By limiting the constraints on various fluxes and altering the environmental inputs to the 
metabolic network, genetic deletions may be analyzed for their affects on growth. This invention is embodied in a 
software application that can be used to create the stoichiometric matrix for a fully sequenced and annotated genome. 

35 Additionally, the software application can be used to further analyze and manipulate the network so as to predict the 
ability of an organism to produce biomolecules necessary for growth, thus, essentially simulating a genetic deletion. 
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Brief Description of the Drawings 



Figure 1 is a flow diagram illustrating one procedure for creating metabolic genotypes from genomic 
sequence data for any organism. 

Figure 2 is a flow diagram illustrating one procedure for producing in silico microbial strains from the 
metabolic genotypes created by the method of Figure 1, along with additional biochemical and microbiological data. 

Figure 3 is a graph illustrating a prediction of genome scale shifts in transcription. The graph shows the 
different phases of the metabolic response to varying oxygen availability, starting from completely aerobic to 
completely anaerobic in £ colt. The predicted changes in expression pattern between phases II and V are indicated. 



This invention relates to systems and methods for utilizing genome annotation data to construct a 
stoichiometric matrix representing most of all of the metabolic reactions that occur within an organism. Using these 
systems and methods, the properties of this matrix can be studied under conditions simulating genetic deletions in 
order to predict the affect of a particular gene on the fitness of the organism. Moreover, genes that are vital to the 
growth of an organism can be found by selectively removing various genes from the stoichiometric matrix and 
thereafter analyzing whether an organism with this genetic makeup could survive. Analysis of these lethal genetic 
mutations is useful for identifying potential genetic targets for anti microbial drugs. 

It should be noted that the systems and methods described herein can be implemented on any conventional 
host computer system, such as those based on Intel® microprocessors and running Microsoft Windows operating 
systems. Other systems, such as those using the UNIX or LINUX operating system and based on IBM®, DEC® or 
Motorola® microprocessors are also contemplated. The systems and methods described herein can also be 
implemented to run on client-server systems and wide-area networks, such as the Internet. 

Software to implement the system can be written in any well-known computer language, such as Java, C, 
C + +, Visual Basic, FORTRAN or COBOL and compiled using any' well-known compatible compiler. 

The software of the invention normally runs from instructions stored in a memory on the host computer 
system. Such a memory can be a hard disk, Random Access Memory, Read Only Memory and Flash Memory. Other 
types of memories are also contemplated to function within the scope of the invention. 

A process 10 for producing metabolic genotypes from an organism is shown in Figure 1. Beginning at a start 
state 12, the process 10 then moves to a state 14 to obtain the genomic DNA sequence of an organism. The 
nucleotide sequence of the genomic DNA can be rapidly determined for an organism with a genome size on the order of 
a few million base pairs. One method for obtaining the nucleotide sequences in a genome is through commercial gene 
databases. Many gene sequences are available on-line through a number of sites (see, for example, www.tior.oro) and 
can easily be downloaded from the Internet. Currently, there are 16 microbial genomes that have been fully sequenced 
and are publicly available, with countless others held in proprietary databases. It is expected that a number of other 
organisms, including pathogenic organisms will be found in nature for which little experimental information, except for 
its genome sequence, will be available. 
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Once the nucleotide sequence of the entire genomic DNA in the target organism has been obtained at state 
14 r the coding regions, also known as open reading frames, are determined at a state 16. Using existing computer 
algorithms, the location of open reading frames that encode genes from within the genome can be determined. For 
example, to identify the proper location, strand, and reading frame of an open reading frame one can perform a gene 
5 search by signal (promoters, ribosomal binding sites, etc.) or by content (positional base frequencies, codon 
preference). Computer programs for determining open reading frames are available, for example, by the University of 
Wisconsin Genetics Computer Group and the National Center for Biotechnology Information. 

After the location of the open reading frames have been determined at state 16, the process 10 moves to 
state 18 to assign a function to the protein encoded by the open reading frame. The discovery that an open reading 
1 0 frame or gene has sequence homology to a gene coding for a protein of known function, or family of proteins of known 
function, can provide the first clues about the gene and it's related protein's function. After the locations of the open 
reading frames have been determined in the genomic DNA from the target organism, well-established algorithms (i.e. 
the Basic Local Alignment Search Tool (BLAST) and the FAST family of programs can be used to determine the extent 
of similarity between a given sequence and gene/protein sequences deposited in worldwide genetic databases. If a 
15 coding region from a gene in the target organism is homologous to a gene within one of the sequence databases, the 
open reading frame is assigned a function similar to the homologously matched gene. Thus, the functions of nearly the 
entire gene complement or genotype of an organism can be determined so long as homologous genes have already been 
discovered. 

All of the genes involved in metabolic reactions and functions in a cell comprise only a subset of the 
20 genotype. This subset of genes is referred to as the metabolic genotype of a particular organism. Thus, the metabolic 
genotype of an organism includes most or all of the genes involved in the organism's metabolism. The gene products 
produced from the set of metabolic genes in the metabolic genotype carry out all or most of the enzymatic reactions 
and transport reactions known to occur within the target organism as determined from the genomic sequence. 

To begin the selection of this subset of genes, one can simply search through the list of functional gene 
25 assignments from state 18 to find genes involved in cellular metabolism. This would include genes involved in central 
metabolism, amino acid metabolism, nucleotide metabolism, fatty acid and lipid metabolism, carbohydrate assimilation, 
vitamin and cofactor biosynthesis, energy and redox generation, etc. This subset is generated at a state 20. The 
process 10 of determining metabolic genotype of the target organism from genomic data then terminates at an end 
stage 22. 

30 Referring now to Figure 2, the process 50 of producing a computer model of an organism. This process is 

also known as producing in silico microbial strains. The process 50 begins at a start state 52 (same as end state 22 
of process 10) and then moves to a state 54 wherein biochemical information is gathered for the reactions performed 
by each metabolic gene product for each of the genes in the metabolic genotype determined from process 10. 

For each gene in the metabolic genotype, the substrates and products, as well as the stoichiometry of any 

35 and all reactions performed by the gene product of each gene can be determined by reference to the biochemical 
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literature. This includes information regarding the irreverisble or reversible nature of the reactions. The stoichiometry 
of each reaction provides the molecular ratios in which reactants are converted into products. 

Potentially, there may still remain a few reactions in cellular metabolism which are known to occur from in 
vitro assays and experimental data. These would include well characterized reactions for which a gene or protein has 
yet to be identified, or was unidentified from the genome sequencing and functional assignment of state 14 and 18. 
This would also include the transport of metabolites into or out of the cell by uncharacterized genes related to 
transport. Thus one reason for the missing gene information may be due to a lack of characterization of the actual 
gene that performs a known biochemical conversion. Therefore upon careful review of existing biochemical literature 
and available experimental data, additional metabolic reactions can be added to the list of metabolic reactions 
determined from the metabolic genotype from state 54 at a state 56. This would include information regarding the 
substrates, products, reversibilty/irreversibility, and stoichiometry of the reactions. 

All of the information obtained at states 54 and 56 regarding reactions and their stoichiometry can be 
represented in a matrix format typically referred to as a stoichiometric matrix. Each column in the matrix corresponds 
to a given reaction or flux, and each row corresponds to the different metabolites involved in the given reaction/flux. 
Reversible reactions may either be represented as one reaction that operates in both the forward and reverse direction 
or be decomposed into one forward reaction and one backward reaction in which case all fluxes can only take on 
positive values. Thus, a given position in the matrix describes the stoichiometric participation of a metabolite (listed 
in the given row) in a particular flux of interest (listed in the given column). Together all of the columns of the genome 
specific stoichiometric matrix represent all of the chemical conversions and cellular transport processes that are 
determined to be present in the organism. This includes all internal fluxes and so called exchange fluxes operating 
within the metabolic network. Thus, the process 50 moves to a state 58 in order to formulate all of the cellular 
reactions together in a genome specific stoichiometric matrix. The resulting genome specific stoichiometric matrix is 
a fundamental representation of a genomically and biochemically defined genotype. 

After the genome specific stoichiometric matrix is defined at state 58, the metabolic demands placed on the 
organism are calculated. The metabolic demands can be readily determined from the dry weight composition of the 
cell. In the case of well-studied organisms such as Escherichia coli and Bacillus subtilis, the dry weight composition is 
available in the published literature. However, in some cases it will be necessary to experimentally determine the dry 
weight composition of the cell for the organism in question. This can be accomplished with varying degrees of 
accuracy. The first attempt would measure the RNA, DNA, protein, and lipid fractions of the cell. A more detailed 
analysis would also provide the specific fraction of nucleotides, amino acids, etc. The process 50 moves to state 60 
for the determination of the biomass composition of the target organism. 

The process 50 then moves to state 62 to perform several experiments that determine the uptake rates and 
maintenance requirements for the organism. Microbiological experiments can be carried out to determine the uptake 
rates for many of the metabolites that are transported into the cell. The uptake rate is determined by measuring the 
depletion of the substrate from the growth media. The measurement of the biomass at each point is also required, in 
order to determine the uptake rate per unit biomass. The maintenance requirements can be determined from a 
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chemostat experiment. The glucose uptake rate is plotted versus the growth rate, and the y-intercept is interpreted as 
the non-growth associated maintenance requirements. The growth associated maintenance requirements are 
determined by fitting the model results to the experimentally determined points in the growth rate versus glucose 
uptake rate plot. 

Next, the process 50 moves to a state 64 wherein information regarding the metabolic demands and uptake 
rates obtained at state 62 are combined with the genome specific stoichiometric matrix of step 8 together fully define 
the metabolic system using flux balance analysis (FBA). This is an approach well suited to account for genomic detail 
as it has been developed based on the well-known stoichiometry of metabolic reactions. 

The time constants characterizing metabolic transients and/or metabolic reactions are typically very rapid, on 
the order of milli seconds to seconds, compared to the time constants of cell growth on the order of hours to 
days.Thus, the transient mass balances can be simplified to only consider the steady state behavior. Eliminating the 
time derivatives obtained from dynamic mass balances around every metabolite in the metabolic system, yields the 
system of linear equations represented in matrix notation, 



where S refers to the stoichiometric matrix of the system, and v is the flux vector. This equation simply states that 
over long times, the formation fluxes of a metabolite must be balanced by the degradation fluxes. Otherwise, 
significant amounts of the metabolite will accumulate inside the metabolic network. Applying equation 1 to our 
system we let S now represent the genome specific stoichiometric matrix 

To determine the metabolic capabilities of a defined metabolic genotype Equation 1 is solved for the 
metabolic fluxes and the internal metabolic reactions, v, while imposing constraints on the activity of these fluxes. 
Typically the number of metabolic fluxes is greater than the number of mass balances (i.e., m > n) resulting in a 
plurality of feasible flux distributions that satisfy Equation 1 and any constraints placed on the fluxes of the system. 
This range of solutions is indicative of the flexibility in the flux distributions that can be achieved with a given set of 
metabolic reactions. The solutions to Equation 1 lie in a restricted region. This subspace defines the capabilities of 
the metabolic genotype of a given organism, since the allowable solutions that satisfy Equation 1 and any constraints 
placed on the fluxes of the system define all the metabolic flux distributions that can be achieved with a particular set 
of metabolic genes. 

The particular utilization of the metabolic genotype can be defined as the metabolic phenotype that is 
expressed under those particular conditions. Objectives for metabolic function can be chosen to explore the 'best' use 
of the metabolic network within a given metabolic genotype. The solution to equation 1 can be formulated as a linear 
programming problem, in which the flux distribution that minimizes a particular objective if found. Mathematically, 
this optimization can be stated as; 



S»v-0 



Equation 1 



Minimize Z 



Equation 2 




Equation 3 
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where Z is the objective which is represented as a linear combination of metabolic fluxes vj. The optimization can also 
be stated as the equivalent maximization problem; i.e. by changing the sign on Z. 

This general representation of Z enables the formulation of a number of diverse objectives. These objectives 
can be design objectives for a strain, exploitation of the metabolic capabilities of a genotype, or physiologically 
meaningful objective functions, such as maximum cellular growth. For this application, growth is to be defined in 
terms of biosynthetic requirements based on literature values of biomass composition or experimentally determined 
values such as those obtained from state 60. Thus, we can define biomass generation as an additional reaction flux 
draining intermediate metabolites in the appropriate ratios and represented as an objective function Z. In addition to 
draining intermediate metabolites this reaction flux can be formed to utilize energy molecules such as ATP, NADH and 
NADPH so as to incorporate any maintenance requirement that must be met. This new reaction flux then becomes 
another constraint/balance equation that the system must satisfy as the objective function. It is analagous to adding 
an addition column to the stoichiometric matrix of Equation 1 to represent such a flux to describe the production 
demands placed on the metabolic system. Setting this new flux as the objective function and asking the system to 
maximize the value of this flux for a given set of constraints on all the other fluxes is then a method to simulate the 
growth of the organism. 

Using linear programming, additional constraints can be placed on the value of any of the fluxes in the 
metabolic network. 



These constraints could be representative of a maximum allowable flux through a given reaction, possibly 
resulting from a limited amount of an enzyme present in which case the value for a, would take on a finite value. 
These constraints could also be used to include the knowledge of the minimum flux through a certain metabolic 
reaction in which case the value for ft would take on a finite value. Additionally, if one chooses to leave certain 
reversible reactions or transport fluxes to operate in a forward and reverse manner the flux may remain unconstrained 
by setting ft to negative infinity and a } to positive infinity. If reactions proceed only in the forward reaction ft is set 
to zero while a, is set to positive infinity. As an example, to simulate the event of a genetic deletion the flux through 
all of the corresponding metabolic reactions related to the gene in question are reduced to zero by setting ft and a, to 
be zero in Equation 4. Based on the in vivo environment where the bacteria lives one can determine the metabolic 
resources available to the cell for biosynthesis of essentially molecules for biomass. Allowing the corresponding 
transport fluxes to be active provides the in silico bacteria with inputs and ouputs for substrates and by-products 
produces by the metabolic network. Therefore as an example, if one wished to simulate the absence of a particular 
growth substrate one simply constrains the corresponding transport fluxes allowing the metabolite to enter the cell to 
be zero by allowing ft and a, to be zero in Equation 4. On the other hand if a substrate is only allowed to enter or exit 
the cell via transport mechanisms, the corresponding fluxes can be properly constrained to reflect this scenario. 

Together the linear programming representation of the genome-specific stoichiometric matrix as in Equation 1 
along with any general constraints placed on the fluxes in the system, and any of the possible objective functions 



Equation 4 



-8- 



WO00/46405 ^ ^ PCT/US00/02882 

completes the formulation of the in silico bacterial strain. The in silica strain can then be used to study theoretical 
metabolic capabilities by simulating any number of conditions and generating flux distributions through the use of 
linear programming. The process 50 of formulating the in silico strain and simulating its behavior using linear 
programming techniques terminates at an end state 66. 
5 Thus, by adding or removing constraints on various fluxes in the network it is possible to (1) simulate a 

genetic deletion event and (2) simulate or accurately provide the network with the metabolic resources present in its in 
vivo environment. Using flux balance analysis it is possible to determine the affects of the removal or addition of 
particular genes and their associated reactions to the composition of the metabolic genotype on the range of possible 
metabolic phenotypes. If the removal/deletion does not allow the metabolic network to produce necessary precursors 
10 for growth, and the cell can not obtain these precursors from its environment, the deletion(s) has the potential as an 
antimicrobial drug target, thus by adjusting the constraints and defining the objective function we can explore the 
capabilities of the metabolic genotype using linear programming to optimize the flux distribution through the metabolic 
network. This creates what we will refer to as an//? silico bacterial strain capable of being studied and manipulated to 
analyze, interpret, and predict the genotype-phenotype relationship. It can be applied to assess the affects of 
15 incremental changes in the genotype or changing environmental conditions, and provide a tool for computer aided 
experimental design. It should be realized that other types of organisms can similarly be represented in silico and still 
be within the scope of the invention. 

The construction of a genome specific stoichiometric matrix and in silico microbial strains can also be applied 
to the area of signal transduction. The components of signaling networks can be identified within a genome and used 
20 to construct a content matrix that can be further analyzed using various techniques to be determined in the future. 
Example 1: E. coli metabolic genotype and in silico model 

Using the methods disclosed in Figures 1 and 2, an in silico strain of Escherichia coli K-12 has been 
constructed and represents the first such strain of a bacteria largely generated from annotated sequence data and 
from biochemical information. The genetic sequence and open reading frame identifications and assignments are 
25 readily available from a number of on-line locations (ex: www.tigr.org). For this example we obtained the annotated 
sequence from the following website for the £ coli Genome Project at the University of Wisconsin 
(httn:<lwww.aenetics.wisc.edul) . Details regarding the actual sequencing and annotation of the sequence can be found 
at that site. From the genome annotation data the subset of genes involved in cellular metabolism was determined as 
described above in Figure 1, state 20, comprising the metabolic genotype of the particular strain of £ coli. 
30 Through detailed analysis of the published biochemical literature on £ coli we determined (1) all of the 

reactions associated with the genes in the metabolic genotype and (2) any additional reactions known to occur from 
biochemical data which were not represented by the genes in the metabolic genotype. This provided all of the 
necessary information to construct the genome specific stoichiometric matrix for £ coliK-M. 

Briefly, the £ coli K-12 bacterial metabolic genotype and more specifically the genome specific 
35 stoichiometric matrix contains 731 metabolic processes that influence 436 metabolites (dimensions of the genome 
specific stoichiometric matrix are 436 x 731). There are 80 reactions present in the genome specific stoichiometric 
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matrix that do not have a genetic assignment in the annotated genome, but are known to be present from biochemical 
data. The genes contained within this metabolic genotype are shown in Table 1 along with the corresponding 
reactions they carry out. 

Because £ coli is arguably the best studied organism, it was possible to determine the uptake rates and 
maintenance requirements (state 62 of Figure 2) by reference to the published literature. This in silico strain accounts 
for the metabolic capabilities of £ coli. It includes membrane transport processes, the central catabolic pathways, 
utilization of alternative carbon sources and the biosynthetic pathways that generate all the components of the 
biomass. In the case of £ coli K-12, we can call upon the wealth of data on overall metabolic behavior and detailed 
biochemical information about the in vivo genotype to which we can compare the behavior of the in silico strain. One 
utility of FBA is the ability to learn about the physiology of the particular organism and explore its metabolic 
capabilities without any specific biochemical data. This 1 ability is important considering possible future scenarios in 
which the only data that we may have for a newly discovered bacterium (perhaps pathogenic) could be its genome 
sequence. 

Example 2: in silico deletion analysis for E. coli to find antimicrobial targets 

Using the in silico strain constructed in Example 1, the effect of individual deletions of all the enzymes in 
central metabolism can be examined in silico. For the analysis to determine sensitive linkages in the metabolic network 
of £ coli, the objective function utilized is the maximization of the biomass yield. This is defined as a flux draining the 
necessary biosynthetic precursors in the appropriate ratios. This flux is defined as the biomass composition, which 
can be determined from the literature. See Neidhardt et. al., Escherichia coli and Salmonella: Cellular and Molecular 
Biology. Second Edition, ASM Press, Washington D.C., 1996. Thus, the objective function is the maximization of a 
single flux, this biosynthetic flux. 

Constraints are placed on the network to account for the availability of substrates for the growth of £ coli. 
In the initial deletion analysis, growth was simulated in an aerobic glucose minimal media culture. Therefore, the 
constraints are set to allow for the components included in the media to be taken up. The specific uptake rate can be 
included if the value is known, otherwise, an unlimited supply can be provided. The uptake rate of glucose and oxygen 
have been determined for £ coli (Neidhardt et. al„ Escherichia coli and Salmonella: Cellular and Molecular Biology. 
Second Edition, ASM Press, Washington O.C., 1996. Therefore, these values are included in the analysis. The uptake 
rate for phosphate, sulfur, and nitrogen source is not precisely known, so constraints on the fluxes for the uptake of 
these important substrates is not included, and the metabolic network is allowed to take up any required amount of 
these substrates. 

The results showed that a high degree of redundancy exists in central intermediary metabolism during growth 
in glucose minimal media, which is related to the interconnectivity of the metabolic reactions. Only a few metabolic 
functions were found to be essential such that their loss removes the capability of cellular growth on glucose. For 
growth on glucose, the essential gene products are involved in the 3-carbon stage of glycolysis, three reactions of the 
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TCA cycle, and several points within the PPP. Deletions in the 6-carbon stage of glycolysis result in a reduced ability 
to support growth due to the diversion of additional flux through the PPP. 

The results from the gene deletion study can be directly compared with growth data from mutants. The 
growth characteristics of a series of £ coli mutants on several different carbon sources were examined (80 cases 
were determined from the literature), and compared to the in silico deletion results (Table 2). The majority (73 of 80 
cases or 91%) of the mutant experimental observations are consistent with the predictions of the in si/ico study. The 
results from the in silico gene deletion analysis are thus consistent with experimental observations. 

Example 3: Prediction of genome scale shifts in gene expression 

Flux based analysis can be used to predict metabolic phenotypes under different growth conditions, such as 
substrate and oxygen availability. The relation between the flux value and the gene expression levels is non-linear, 
resulting in bifurcations and multiple steady states. However, FBA can give qualitative (on/off) information as well as 
5 the relative importance of gene products under a given condition. Based on the magnitude of the metabolic fluxes, 
qualitative assessment of gene expression can be inferred. 

Figure 3a shows the five phases of distinct metabolic behavior of £ Coli in response to varying oxygen 
availability, going from completely anaerobic (phase I) to completely aerobic (phase V). Figures 3b and 3c display lists 
of the genes that are predicted to be induced or repressed upon the shift from aerobic growth (phase V) to nearly 
10 complete anaerobic growth (phase II). The numerical values shown in Figures 3b and 3c are the fold change in the 
magnitude of the fluxes calculated for each of the listed enzymes. 

For this example, the objective of maximization of biomass yield is utilized (as described above). The 
constraints on the system are also set accordingly (as described above). However, in this example, a change in the 
availability of a key substrate is leading to changes in the metabolic behavior. The change in the parameter is reflected 
15 as a change in the uptake flux. Therefore, the maximal allowable oxygen uptake rate is changed to generate this data. 
The figure demonstrates how several fluxes in the metabolic network will change as the oxygen uptake flux is 
continuously decreased. Therefore, the constraints on the fluxes is identical to what is described in the previous 
section, however, the oxygen uptake rate is set to coincide with the point in the diagram. 

Corresponding experimental data sets are now becoming available. Using high-density oligonucleotide arrays 
20 the expression levels of nearly every gene in Saccharomyces cerevisiaa can now be analyzed under various growth 
conditions. From these studies it was shown that nearly 90% of all yeast mRNAs are present in growth on rich and 
minimal media, while a large number of mRNAs were shown to be differentially expressed under these two conditions. 
Another recent article shows how the metabolic and genetic control of gene expression can be studied on a genomic 
scale using DNA microarray technology (Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic 
25 Scale, Science, Vol. 278, October 24, 1997. The temporal changes in genetic expression profiles that occur during the 
diauxic shift in S. cerevisiaa were observed for every known expressed sequence tag (EST) in this genome. As shown 
above, FBA can be used to qualitatively simulate shifts in metabolic genotype expression patterns due to alterations in 
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growth environments. Thus, FBA can serve to complement current studies in metabolic gene expression, by providing 
a fundamental approach to analyze, interpret, and predict the data from such experiments. 
Example 4: Design of defined media 

An important economic consideration in large-scale bioprocesses is optimal medium formulation. FBA can be 
used to design such media. Following the approach defined above, a flux-balance model for the first completely 
sequenced free living organism, Haemophilus influenzae, has been generated. One application of this model is to 
predict a minimal defined media. It was found that H. influenzae can grow on the minimal defined medium as 
determined from the ORF assignments and predicted using FBA. Simulated bacterial growth was predicted using the 
following defined media: fructose, arginine, cysteine, glutamate, putrescine, spermidine, thiamin, MAD, tetrapyrrole, 
pantothenate, ammonia, phosphate. This predicted minimal medium was compared to the previously published defined 
media and was found to differ in only one compound, inosine. It is known that inosine is not required for growth, 
however it does serve to enhance growth. Again the in si/ico results obtained were consistent with published in vivo 
research. These results provide confidence in the use of this type of approach for the design of defined media for 
organisms in which there currently does not exist a defined media. 

While particular embodiments of the invention have been described in detail, it will be apparent to those skilled in 
the art that these embodiments are exemplary rather than limiting, and the true scope of the invention is defined by the 
claims that follow. 
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Table 2 

Comparison of the predicted mutant growth characteristics from the gene deletion study to published experimental 
results with single and double mutants. 
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Gene Glucose Glycerol Succinate Acetate 

\jn vivo\in silico) (in vivolin silico) \jn vivolin silico) [in vivo\in silico) 
unc +/+ ■/■ -/. 

zwf +/+ 
sucAD +/+ 
zwf,pnt +/+ 

pck, mez -/- */• 

pck,pps */• 
pgi zwf •/- 
pgi gnd 

•/• 

tktA, tktB ■/• 

Results are scored as + or - meaning growth or no growth determined from in vivo I in silico data. In 73 of 80 cases 
the//7 silico behavior is the same as the experimentally observed behavior. 
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WHAT IS CLAIMED IS: 



1. 



A method for determining the genome specific stoichiometric matrix of an organism, comprising: 



providing the nucleotide sequence of a metabolic gene in the organism; 
identifying the open reading frame of the metabolic gene; 

assigning a function to the metabolic gene based on its nucleotide or amino acid homology to other, 
known metabolic genes; 

determining the metabolic genotype of the organism based on the assigned function of the 
metabolic gene; and 

determining the genome specific stoichiometric matrix for the organism. 

2. The method of Claim 1, further comprising determining a phenotype of the organism. 

3. The method of Claim 2, wherein determining the phenotype of the organism comprises analyzing 
the consequences of reduction or addition to the composition of the metabolic genotype. 

4. The method of Claim 2, further comprising identifying lethal genetic deletions. 

5. The method of Claim 4 further comprising determining the effectiveness of a drug through analysis 
of the lethal genetic deletions. 

6. The method of Claim 1, further comprising determining the minimal media composition required to 
sustain growth of the organism. 

7. The method of Claim 1, further comprising determining an optimal media composition for growing 



genes in the organism necessary to sustain growth in a particular environmental condition. 

9. The method of Claim 1 , wherein the organism is Escherichia ColL 

10. The method of Claim 1, comprising the use of a Flux Based Analysis on the stoichiometric matrix. 

11. The method of Claim 1, comprising adding biochemical information for a metabolic gene to the 
stoichiometric matrix. 

12. A method for determining a potential genetic target for a drug that kills an organism, comprising: 
providing the nucleotide sequence of a metabolic gene in the organism; 

identifying the open reading frame of the metabolic gene; 

assigning a function to the metabolic gene based on its nucleotide or amino acid homology to other, 
known metabolic genes; 

determining whether the metabolic gene is required for growth of the organism; 

repeating the providing, identifying, assigning and determining steps for other metabolic genes of 
the organism; and 

selecting a gene that is required for growth of the organism as a target for the drug. 

13. The method of Claim 12, wherein the organism is Escherichia coii. 



the organism. 



8. 



The method of Claim 1, further comprising determining the most advantageous complement of 
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14. The method of Claim 12, comprising performing a Flux Based Analysis of a stoichiometric matrix 
from the organism. 

15. The method of Claim 12, comprising the use of biochemical information on the metabolic gene to 
determine whether it is required for growth of the organism. 

5 1 6. A computer system comprising a memory having instructions that when executed perform the steps 

of: 

providing the nucleotide sequence of a metabolic gene in an organism; 
identifying the open reading frame of the metabolic gene; 

assigning a function to the metabolic gene based on its nucleotide or amino acid homology to other, 
10 known metabolic genes; 

determining the metabolic genotype of the organism based on the assigned function of the 
unknown metabolic gene; and 

determining the genome specific stoichiometric matrix for the organism. 

17. The computer system of Claim 16, wherein said memory is selected from the group consisting of: a 
15 hard disk, optical memory. Random Access Memory, Read Only Memory and Flash Memory. 

18. The computer system of Claim 16, wherein said computer system is based on an Intel® 
microprocessor. 

1 9. The computer system of Claim 1 6, wherein the organism is Escherichia CoIL 

20. The computer system of Claim 16, further comprising instructions that when executed perform the 
20 method of identifying lethal genetic deletions for the organism. 

21. The computer system of Claim 16, comprising instructions, that when executed, add biochemical 
information on a metabolic gene to the stoichiometric matrix. 

22. A method for representing a living organism in a computer system, comprising: 
providing the nucleotide sequence of a metabolic gene in the organism; 

25 identifying the open reading frame of the metabolic gene; 

assigning a function to the metabolic gene based on its nucleotide or amino acid homology to other, 
known metabolic genes; 

determining the metabolic genotype of the organism based on the assigned function of the 
metabolic gene; 

30 determining the genome specific stoichiometric matrix for the organism; and 

storing the genome specific stoichiometric matrix in a memory of the computer. 

23. The method of Claim 22, wherein the organism is Escherichia coll 

24. The method of Claim 22, comprising the use of Flux Based Analysis to analyze the stoichiometric 

matrix. 

35 25. The method of Claim 22, comprising adding biochemical information on a metabolic gene to 

determine the metabolic genotype of the organism. 
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26. The method of Claim 22, comprising calculating the genome specific stoichiometric matrix using 
Flux Based Analysis. 

27. A genome specific stoichiometric matrix representing the metabolism of a living organism, produced 
by a process comprising: 

providing the nucleotide sequence of a metabolic gene in the organism; 
identifying the open reading frame of the metabolic gene; 

assigning a function to the metabolic gene based on its nucleotide or amino acid homology to other, 
known metabolic genes; 

determining the metabolic genotype of the organism based on the assigned function of the 
metabolic gene; and 

determining the genome specific stoichiometric matrix for the organism. 

28. The stoichiometric matrix of Claim 1, wherein the organism is Escherichia ColL 

29. The stoichiometric matrix of Claim 1, wherein the stoichiometric matrix is determined using Flux 
Based Analysis. 

30. The stoichiometric matrix of Claim 1, produced by the process of adding biochemical information 
for the metabolic gene. 
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